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Abstract A product configurator which is complete, backtrack free and able 
to compute the valid domains at any state of the configuration can be con- 
structed by building a Binary Decision Diagram (BDD). Despite the fact that 
the size of the BDD is exponential in the number of variables in the worst case, 
BDDs have proved to work very well in practice. Current BDD-based techniques 
can only handle interactive configuration with small finite domains. In this pa- 
per we extend the approach to handle string variables constrained by regular 
expressions. The user is allowed to change the strings by adding letters at the 
end of the string. We show how to make a data structure that can perform fast 
valid domain computations given some assignment on the set of string variables. 

We first show how to do this by using one large DFA. Since this approach 
is too space consuming to be of practical use, we construct a data structure 
that simulates the large DFA and in most practical cases are much more space 
efficient. As an example a configuration problem on n string variables with 
only one solution in which each string variable is assigned to a value of length 
of k the former structure will use f2(fc n ) space whereas the latter only need 
0(kn). We also show how this framework easily can be combined with the 
recent BDD techniques to allow both boolean, integer and string variables in 
the configuration problem. 



1 Introduction 

Interactive configuration is a special Constraint Satisfaction Problem (CSP), 
where a user is assisted in configuration by interacting with a configurator - a 
computer program. In configuration the user repeatedly chooses an unassigncd 
variable and assigns it a value until all variables are assigned. The task of the 
configurator is to state the valid choices for each of the unassigned variable 
during the configuration. The set of valid choices for an unassigned variable x 
is denoted the valid domain of x [HS1J+04], [S1JH+04]. 

As an example consider the problem of assigning values to the variables x\, x% 
and X3 where x\ S {1, . . . , 5} and X2, x^ G {1, . . . , 10} with the requirement that 
x\ = 1 V x\ = 2 V X2 — 2 and xi = x$. Initially the user can choose to assign 
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a value from {1, . . . , 5} to x\ or assign a value from {1, . . . , 10} to X2 or X3. 
Suppose now the user assigns 3 to X3. In this case the valid domain of X2 is 
{3} and the valid domain of x\ is {1, 2}. We obtain the requirement x-j = 3 by 
X2 = X3 and X3 = 3. Further we obtain x\ € {1, 2} by x\ = 1 V x\ = 2 V X2 = 2 
and X2 = 3. 

The valid domain of each unassigncd variable has to be updated every time a 
value is assigned to a variable as the assignment might make other assignments 
invalid as in the example above. The user interaction with the configurator has 
to be real-time which in practice means that the configurator has to update the 
valid domains within 250 milliseconds [RasOO]. Calculating the valid domains 
is NP-hard since it can be used to solve 3SAT. However by making an off-line 
construction of a Binary Decision Diagram that represents the constraints we 
are able to keep the computation time polynomial in the size of the BDD. The 
BDD constructed can be exponentially large, but in practice BDDs have proved 
themselves to be far from exponential in size for many configuration problems. 

As BDDs use binary variables to represent the domains of the variables we 
normally assume small finite domains. In this paper we will consider the case 
of variables that take strings as their values, hence their domain might not be 
finite. Therefore the standard BDD approach will not be able to handle the 
problem. 

As an example suppose that a user has to fill in a form were there is a lot of 
constraints on the data. Consider a CSP with the variables phone, country, zip 
and district along with the following constraints: 

I The prefix of phone is "+45" country = "Denmark" 

II country = "Denmark" =>■ zip has four digits 

III zip = "2300" A country = "Denmark" •<=>• district = "Copenhagen S" 

Suppose in the CSP above that the user entered district = "Copenhagen S". 
This restricts the valid domain of zip to the singleton set {"2300"} and the 
valid domain of country to {"Denmark"} by (III). The valid domain of phone is 
decreased to the set of strings which has "+45" as a prefix by (I). 

Suppose instead that the user have entered phone = "+45 23493844" . This 
decreases the valid domain of country to {"Denmark"} by (I), and the valid 
domain of zip to strings consisting of 4 digits. Actually this restriction will be 
performed as soon as the user have entered "+45" , since every completion of 
phone achieved by appending additional letters at the end of phone still will 
have "+45" as a prefix. 

2 Related Work 

It has recently been proposed to introduce global constraints that require that 
the variables in the CSP considered in some order has to belong to a regular 
language, supposing that the domain of each variable is contained in the alpha- 
bet of the regular language [Pes04] . This approach has this year (in 2006) been 
extended to global constraints where the variables of the CSP have to belong to 
a specified context-free grammar [QW06] [Sel06] . Both results give algorithms 
for ensuring generalized arc consistency which corresponds to valid domains in 
the case of interactive configuration. 
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Since the value of the variables they consider is one letter in the alphabet of 
the regular language, all words in the regular language they consider have some 
fixed length. 

The type of constraint considered in this paper supports variables that con- 
sist of any number of letters. Further it allows formulas that are multiple mem- 
bership constraints connected by the boolean operators A, V and ->. 

3 Preliminaries 

Consider a CSP stated as C = (X, E, J-). By X = \x\,x-ii ■ ■ ■ > x n} we denote 
the variables of the problem. By E we denote an alphabet. By T = {/i, . . . , / c } 
we denote formulas written using the following syntax 

/::=/V/K/|match(x,a), 

where a is a regular expression over E. The expression match(a;, a) is true if and 
only if x G L(a), where L(a) is the language defined by the regular expression 
a. We use / A g, f => g and / g as shortcuts for -i(->/ V ^g), -if V g and 
(/ => 9) A {g /) respectively. 

Regular expression are written on the syntax: 

a ::= aa | a\a | a* 

listed in increasing order of strength of binding. The expression a* is zero or 
more repetitions of a. The expression aa is the concatenation of two regular 
expressions. The expression a\\a2 means that either a\ or ai. For instance 
L(a\c\(abc*)d) = L((a\c\(ab(c*)))d) = {"ad", "cd", "abd", "abed", "abeed", 
"abeced", ...}. We further use "." as a shortcut for any letter in E - i.e. 
"wi|w2 ■ ■ ■ where {wk \ 1 < k < |E|} = E. 

In the example where user had to fill in some data the restriction (I) from 
last section would be stated as: 

match(phone, "+45.*") ma<c/i(country, "Denmark") 

where phone and country are two variables in X . 

We denote by p = {(x\, Wi), . . . , (x n , w n )} a complete assignment of the 
values wi, . . . , W n G E* to the variables x\,...,x n that is all the variables in X. 
Wo define E* in the usual way as r U E U E' 5 U ■ ■ ■ . The sot of solutions to C is 
the set of assignments to X that satisfy all formulas in J 7 , stated formally: 

sol(C) = {p I p h / for all / e T} 

Definition 1 (Valid Domains). The valid domain of Xi € X relative to an 
assignment p, denoted V£. , is the set of values w G E* for which appending w 
to the current assignment to Xi can be extended to a solution to C by appending 
an appropriate string to values to the assignment to the remaining variables 
X \ {xi}. Stated formally: 

V^. = {w G E* I V : p'(xi) = w A pp' G sol(C)} 

where p and p' are assignments to X and the concatenation pp' is defined by 
pp' = {{xi,p{xi)p'{xi)), . . . , (x n ,p(x n )p'(x n ))} 
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The following theorem will be proved in the next section: 



Theorem 1. For any x G X and any assignment p to X it holds that V£ is a 
regular language. 

The goal of this paper is to construct a data structure that based on a CSP 
C = [X ', T,,J-) support three operations: 

Build(C) that constructs the data structure from C, 

ApPEND(xi,w) that updates p by setting p(xj) to p(xi)w and makes the 
data structure conform to the new p, and 

ValidDomain(xj) that returns a regular expression that corresponds to 
the valid domain of Xi on p that is a regular expression a for which L(a) = 



As the two latter algorithms has to be used during user interaction the goal is 
to make these two operations run as fast as possible without using too much 
space. 

One might consider a fourth operation Complete^) that indicates that 
there will be no more updates to the value of some string variable which will 
imply an additional reduction of the valid domains. In the context of form 
validation this corresponds to the event that the user hits the return key or leaves 
the current input field. In Section 10.3 we show that this extra functionality 
easily can be supported by the three operations already mentioned. 

In order to check whether w G L{a) we use a deterministic finite automaton 
(DFA). We denote DFAs deciding the regular expressions that occurs in T by 
the name match-DFAs. 

4 A Solution based on a single DFA 

In this section we will prove that V£ is a regular language. However we want to 
do more than that. We will present a construction of a DFA that for any .t, G X 
and any assignment p to X, can be turned into a DFA deciding V£. . This proves 
that V£. is a regular language but the data structure that will be presented in 
this section uses too much space to be of any practical use. However it gives us a 
good starting point for making a smaller efficient data structure supporting the 
operations Build, Append and ValidDomain mentioned in the last section. 

The DFA we want to construct is denoted Mc, and is the DFA deciding a 
language we denote Lq. We will now spend some time on defining the language 
Lc- The basic property of Lq is that: 



where w is a word that induces the assignment p w , where the meaning of induces 
will be defined in (3). 

Intuitively we make the alphabet of Lc, denoted Sc, consist of all possible 
AppEND-operations More formally stated Ec C (E U {e})™ where each letter in 
Ec only contain one element different from e that is: 





(1) 



Sc = U U {( e ' ■ ■ ■ ' e ' w ' e ' ■ ■ ■ ' e )} 



(2) 



Every word w in Lc is a concatenation of letters from Sc that is Ic C E^. We 
say that: 

w=Wf-w k induces p w = {(xi, w^i ■■■ w k ,i), (x n , Wi, n ■■■ w k)7l )} (3) 

where Wn denotes the ith element in the letter u>i G Sc an d I < I < k and 
1 < i < n and p w is an assignment to X. Note that for any w = Wi ■ ■ ■ w k every 
word w' that consist of the exactly the letters W\ , • • • , w k permuted in a way that 
maintains the ordering of ii>j,i, . . . for every 1 < i < n we have p w i = p w . 
Hence every assignment p w corresponds to exactly the | p (3^)11!".^ ( x )|; different 
words. For convenience we will in the following, when we use w and p w in the 
same calculations, implicitly assume that w induces p w as defined in (3). 

Example 1. Consider the CSP where X — {xi, £2,2:3} and £ = {a, b}. In this 
case 

S c = {(a, e, e), (b, e, e), (e, a, e), (e, 6, e), (e, e, a), (e, e, b)} 

and for instance does the word w = (a, e, e)(e, e, a)(6, e, e)(a, e, e) induce the 
assignment p w = {(xi, aba), (x2, e), (2:3, a)}, and so does /or instance w' = 
(a, e, e)(o, e, e)(a, e, e)(e, e, a) and «/' = (a, e, e)(o, e, e)(e, e, a) (a, e, e). In the case 
ofw, (1) becomes: 

(6, e, e)(e, e, b)(b, e, e)(a, e, e) £ L c {(2:1, a6a), (.x 2 , e), (x 3 , a)} 6 soZ(C) 

Note however that for instance w'" = (6, e, e)(a, e, e)(e, e, a)(a, e, e) does not in- 
duce p w , since p w m = {(xi, baa), (x 2 , e), (x^, a)}. 

Hence if we can make a DFA that decides Lc this DFA can be used to decide 
for any assignment p whether p G sol(C). In the following we will construct 
such a DFA and we will show how we based on this construction for any V£. 
can make a DFA that decides the language V^ p thereby showing that V^ p is a 
regular language. Before we begin the construction we formally define a DFA: 

Definition 2 (DFA). A deterministic finite automaton DFA = [Q, S, 6, s, A), 
has a finite set of states Q, a transition function 6 : Q x £ — > Q, where £ is 
some alphabet, a starting state s E Q and a set of accepting states ACQ. We 
use S(s, w) as a shorthand for S(- ■ ■ S(S(q, wi), W2), ■ ■ ■ , wi), where (u>i, . . . , wi) 
are the letters of the w G £*. If q = s we write S(q, w) as S(w). 

Definition 3 (Reachability in a DFA). In a DFA M = (Q, S, 6, s, A) a state q 
is reachable from a state p by the string w G S* if and only if S(p,w) = q. In 
particular any state is reachable from itself by the empty string. The state q is 
reachable from p if and only if q is reachable from p by some string. We say 
that a state is reachable in M if it is reachable from the source. 

In the rest of this paper we will use the notation p ^> q to denote that q is 
reachable from p. We will also assume that M 7 = (Q 7 , E 7 , o" 7 , s 7 , Ay) for any 
subscript 7. 

In the rest of this section we will do the following. First we construct the 
DFA M c based on the match-DFAs of T. Wc then reduce the DFA M c , by 
replacing the alphabet and defining Aq ■ Wc thereby obtain that Mq decides the 
language Lc where (gi, . . . , q m ) G Qc and {w\, . . . , w n ) G Ec- Finally we show 
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how we can turn Mc into an automaton deciding V£. by changing the source 
and the alphabet in Mc- 

After this brief overview we begin the actual construction. We start by 
constructing Mc- This construction can be divided into three steps: 

1. For every match-expression match(xi,a) that occurs in T we construct 
a match-DFA that decides the regular language L(a). We denote these 
match-DFAs M\, . . . , M m , where Mj is the match-DFA deciding the reg- 
ular expression in the jth match-expression in J 7 , assuming some order 
on the match-expressions in T . We define the mapping / : {1, . . . , to} — > 
{!,..., n\ such that xi j is the variable that occurs in the jth match- 
expression. 

2. For every state q in the DFAs Mi, . . . , M m wc add a self-looping transition 
on the empty string e ^ £ i.e. the transition S(q, e) = q. This results in 
DFAs as the ones shown in Figure 1 

3. Wc construct a DFA M c = (Q e , T, e , 5 c ,s c , A c ) defined by: 

Qc = Qi x ■ • • x Qm 
sc = (si, . . . , s m ) 

Sc ■ S c ((qi, ■ ■ ■ ,q m ), (wi,... ,w n )) = (<5i(gi, iujJ, . . . ,S m (q n ,w In )) where 
(qx,...,q m ) e Qc and (wi,...,w n ) G E c 

Ac = {{q\, q m ) e Qc I {(yi, (qi g At)), (y m , (q m e Am))} \= 

f[match(xi j ,atj) <— yj] for all / £ T} where we by f[match(xi j ,ctj) <- 
Uj\ mean the formula / where every match-expression on the form 
match(xi ,oij) is replaced by the boolean variable yj. 

The definition of Qc and sc should be straightforward. The definitions of 
Sc and ac need some explanation. 

In order to explain the definition of Sc wc break it down to four steps: 

1. Since every state in Mc is a vector of m states a straightforward definition 
of Sc would be on the alphabet £ m on vectors on to letter. Making 
every transition correspond to taking exactly one step in each of the m 
underlying DFAs. 

2. For our use we need to ensure that we follow transitions on the same 
letter in every set of DFAs that evaluates the same variable. This is 
ensured by using the mapping / : {1, . . . , to} — > {1, . . . , n} defined earlier 
in the Section. The mapping / is used to map every vector of letters in 
(u>i, . . . , w n ) G S" to a vector (wi 1 , . . . , Wj m ) £ S m where wi i = wi j if 
the two match-DFAs Mj and Mj evaluates the same variable. 

3. By extending the alphabet S™ to (E U {e})" we make it possible to make 
movements that corresponds to appending a letter to the value of a subset 
of the variables. 

4. Finally we replace the alphabet (S U {e})" by Sc as defined in (2) - that 
is we remove all letters from the alphabet that does not correspond to 
appending a letter to the value of exactly one variable. 
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The above four steps are described in terms of the alphabet and not in terms of 
transitions. However by exchanging letters above we implicitely mean that the 
definitions of the transitions are exchanged as well. If we for instance exchanged 
a word wi € Si by W2 € E2 the transition 5(p, wi) = q would be exchanged by 
the transition 8(p, W2) = <?■ 

Example 2. Consider the example of a DFA Mq in Figure 2 based on the 
two match- DFAs from Figure 1. In figure 2 we have indicated all transitions 
corresponding to taking a single move in both of the two match-DFAs by arrows. 

For all DFAs in this paper accepting states are indicated by double circles and 
the source is assumed to be the leftmost state in the graph. Further each state are 
labeled with the regular expression corresponding to the state, that is every state 
q is labeled with the regular expression a for which w G L(a) <=> 8(w) = q. 
When the alphabet is of the DFA is a subset of E 2 we label the states by two 
regular expression a and (3 such that ui\ G L(a) A1H2 6 L(fi) 8{w\W2) = q 

If the two match-DFAs are based on match- expressions on different variables, 
5c is only defined for the solid transitions in figure 1. If the two DFAs are based 
on match- expression on the same variable 8q is only defined for the dashed 
transition. In the latter case only two states are reachable from the source of 
the DFA, 




Figure 1: DFAs on L("ab") and L("ac"). 
In order to explain the definition of Ac we define the transition function 8c 

as: 

8c{w) = {h{p w {xi^)), . . . ,8 n (p w (x In ))) (4) 

where w is a word that induces p w . Note that this definition complies with the 
definition of 8 in the definition of a DFA though it differs in syntax. 

Our goal is to make Mq decide Lq. In order for this to be the case Ac has 
to satisfy 

8c(w) g Ac w e L c <^=> p w e sol{C) (5) 

that is 

Ac = U £ Qc I e Sc : <$cH =qAp w & sol{C)} 

Note by (5) that for each q £ Qc the statement p w e sol(C) cither holds for 
all w for which 8ciw) — q or for none of these ws. This is due to the fact that 
all the ws for which 8c(w) = q corresponds to the same set of states in the 
match-DFAs: (q±, . . . , q m ) ~ q and hence will evaluate the match-expressions in 
T to the same boolean values. 

To check for some q G Qc whether there exists a w £ £ c f° r which 8c (w) = q 
is a simple task but checking whether p w G sol(C) holds for every w for which 
5c (w) = q requires some explanation. 

Every jth match-expression in T evaluated by the match-DFA Mj is a term 
that either is true or false depending on whether the current state in the Mj 
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(e,e) 




Figure 2: The DFA M c built based on the DFAs corresponding to L("ab") and 
L("ac") showed in Figure 1 

is accepting or not. Every state q in Mq corresponds to the combination of 
states (qi, . . . ,q m ) = q in the match-DFAs Mi, ... , M m . Because of this we 
might intuitively consider every match-expression as a boolean variable. Let us 
denote the boolean variables yi, . . . , y m , and let yj correspond to the jth match- 
expression in T for 1 < j < m. Observe that every state q = (qi, . . . , q m ) G Qc 
can be conceived as a complete assignment of boolean values to such y\ , . . . , y m 
by acceptance/rejection of qi, ■ ■ ■ , q m by Mi, . . . , M rn respectively If this com- 
plete assignment satisfies every formula / G T , then we have for every b(w) = q 
that p w G sol(C), otherwise we have for every S(w) = q that p w ^ sol(C). 
We restate this in formal terms. We first define the assignment 

T q = {{Vl, (Ql 6 ^l)): ' ■ ■ : (Um, (3m G An))} 

to the boolean variables yi, . . . , y m . We let «j be the regular-expression in the 
jth match-expression and obtain 

p w G sol(C) <^=> t^ w) \= flmatch^i^ctj) <- for all / G T 

where we by j\match(xi i ,a>j) <— yj] we mean the formula / where every match- 
expression on the form match(xi j , ctj) is replaced by the boolean variable yj. 
Using equation (5) this can be rewritten as: 

Ac = {q G Qc | r q |= f[match(xi ] , otj) <- for all / G T} 

Checking for some q whether r q \= flmatchixj^aj) <— yj] can be done by 
simply plugging in some values in the boolean formula / and checking whether 
this makes / true or false. 
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Having explained 6c and Ac we now consider how to turn Mc into a DFA 
that decides V£. . We start by stating the definition of valid domains (Definition 
f ) in term of the language Lc as: 

V£ = {w £ E* | 3w c G L c : p wc ( x i) = P( x i) w } 
If we want to change Mc such that it decides V£. we have to do two things: 

1. Set the source in Mc to Sc{w), where w £ Ec is the word corresponding 
to p 

2. Project the alphabet on Xi - that is, replace every letter w = (wi, . . . , w n ) G 
E c by w, G E U {e} 

Note that the second step turn all transitions on w for which w% — e into e- 
transitions, hence we have made a non-deterministic automaton on the alphabet 
E, deciding V£.. Using basic automata theory we obtain a corresponding DFA 
and the corresponding regular expression. 

Example 3. Consider the example C = (X,H,J-) where X = {xi,X2~\,J : 
j . f 2 where f\ = match{x\, "ab") V match(x2, "abc") and ji = match(x2, "abd* "). 
We construct the match-DFAs M\, M 2 and M3 on the regular languages L("ab"), 
L("abc") and L("abd*") respectively. To each state in Mi,M 2 and M3 we add 
e-transitions that are self-loops. The resulting DFAs are shown in Figure 3. 

We now begin the construction of the DFA Mc- Since Qc = Qi xQ2 x Q3 this 
DFA will have \Qi \ ■ \Q%\ • \Qa\ = 3 ■ 4 ■ 3 = 36 states. However not all the states 
are reachable by sc since 8c is only defined on Eg = Uuies {{( w > e )l ^ {( e ' w )}) ■ 
The remaining H states and the transitions in Mc are shown in Figure 4. 

We can check for each state in Mc whether it is accepting by checking if its 
corresponding states in the match DFAs Mi, . . . , M m yield an assignment to the 
match- expressions by acceptance/rejection that make T true. In this example 
only the state labeled "(ab,abd*)" is accepting. 

Suppose now we want to calculate Vf^ where p = {(x\, "a")(x2, "ab")}. We 
first set sc = Sc({ "a", e)(e, "a")(e, '?>")) an d then replace every letter w = (wi, W2) G 
Ec by W2 G EU{e} that is every (e, W2) by W2 and every (u>i, e) by e. The result- 
ing non- deterministic automaton and its corresponding DFA is shown in Figure 
5. In this example we get Vx^ 1 ' a ''^ X2 ' ab ^ = "d*". 




Figure 3: The upper two DFAs stems from mabch{x\, "ab") and 
match(x2, "abc") respectively. The lower DFA stems from match(x2, "abd*") 
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Figure 4: A DFA Mc built on the formula /i = matc/i(x2,"abc") V 
match(xi,"&b") A matc/i(x2,"abd*"). The transition-labels l:w and 2:w where 
w G £ corresponds to the assignments p = {(xi, io)(x2, e)} and {(xi, e), (X2, w)} 
respectively. For simplicity the states corresponding to rejection of any of the 
match-expressions are not included. A DFA with all states are shown in Figure 
7 in the Appendix. 



Figure 5: To the left: The non-deterministic automaton deciding valid domains 
V£ % where p = {(xi,"a"), (x 2 ,"ab")} derived from the DFA M c in Figure 4. To 
the right: the corresponding DFA. 

The size of the valid domains DFA Though both updating and comput- 
ing valid domains will be fast using this solution, the size of the DFA is too large 
for the solution to be of any use for larger problems. As an example a prob- 
lem on n variables containing a single solution {(xi,Wi), . . . , (x n ,w n )} where 
\wi\ = k for all 1 < i < n the Mc will contain fl(k n ) states. The construction 
that we will achieve at the end of this paper will contain O(kn) states. 

4.1 Simulating the valid domains DFA 

In order to make a less space consuming construction we separate the valid 
domains DFA into smaller DFAs, that is instead of joining all the match-DFAs 
into the DFA Mc we only join match-DFAs on the same variable. The drawback 
of this approach is that we cannot encode the boolean logic of J- into the DFAs 
on the variables as each of these DFAs only constitutes a partial solution to 
T . We therefore build a BDD on the boolean logic of T . In this BDD every 
match-expression is considered as a boolean variable. Given any combination 
of states in the DFAs on the values the we can compute the value of Ac on 
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the fly by restricting the BDD to the acceptance and rejections of the various 
values. In this way we are able the simulate the DFA Mc by a much smaller 
data-structure. This structure will perform well in terms of updating values and 
deciding Lc and reporting V£. . Performing this construction is the main of this 
paper. 

In Section 5 we describe how to encode a set of the DFAs on the same 
variable into a Multi-DFA that can simulate many DFAs simultaneously on the 
same string. In Section 6 we encode into every state q in the Multi-DFA which 
combinations of acceptance/rejection by the simulated match-DFAs that can be 
reached by following transitions corresponding to some word from q in the Multi- 
DFA. In Section 7 we construct the BDD taking care of the boolean logic in T as 
the constraint problem T>, where every match-expression is replaced by a boolean 
variable. In Section 8 we present the algorithms Build(C), Append^, w) and 
VALiDDOMAiN(a;,;). Finally in Section 10 we consider various extensions to the 
data structure. 

5 DFAs and Multi-DFAs 

By the construction of the DFA Mc in the previous section we have ensured 
two properties: 

1. All small DFAs on the same variable are synchronized 

2. All states that cannot be a valid solution are removed 

In order to reduce the space consumption of the DFAs we will present solution 
that only join match-DFAs on the same variable. By doing this we ensure (1). 
In the last section we could ensure (2) simply by minimizing the DFA. We do 
not have this option if we separate DFAs on different variables since the DFAs 
will not have the logic of T encoded in their structure. This problem will be 
addressed in Section 7. 

Since DFAs on a single variable often will be the combination of more than 
one match-DFA and since the value of one variable is not enough to determine 
whether or not J- is satisfied, we cannot use acceptance and rejection in the 
same way as in Section 7. We therefore replace the notion of accepting states 
by an bit-vector denoted acceptance value assigned to each state containing true 
or false for each of the match-DFAs accepting or rejecting for each in the current 
state. This is the idea behind the following generalization of the definition of a 
DFA. 

Definition 4. A multi-DFA (MDFA) (Q, S, S, s, a) of acceptance size k, has a 
finite set of states Q, a transition function 8 : Q x S — > Q, where £ is some 
alphabet, a starting state s and an acceptance value a(q) G B fe for every q G Q. 
The acceptance value of a word w is defined as a(6(w)) G B fe . 

Note that the definition above assigns exactly one acceptance value to every 
finite string in £*. Note further that an MDFA with acceptance size 1 is a 
standard DFA with the set of accepting states {q | a(q) = (true)}. 

As we in the rest of this paper only use the alphabet £ given by the CSP 
C = (X ', S, T) we will from now on not state the alphabet £ explicitly in our def- 
initions of DFAs and MDFAs. In other words we use (Q, 6, s, A) and (Q, S, s, a) 
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as a shortcuts for (Q, E, 6, s, A) and (Q, E, 6, s, a) for DFAs and MDFAs respec- 
tively, where E is the alphabet given by C. 

Construction of an MDFA We might build an MDFA by slightly modifying 
the construction of the DFA Mq- However this might make the intermediate 
structure very large. Instead we use a simple approach making a simultaneous 
DFS in the DFAs that has to be joined as described in the next two bits of 
pseudocode. We let (i, Q, 5, s, a, k and Qi, <5j, Sj, a;, for 1 < i < k be globals. 

RECCONSTRUCTMDFA(gi, . . . , q k ) 

1 if fJ.(qi, ■ ■ ■ , qk) is defined 

2 then return fi(qi, . . . , qk) 

3 create a new state q ^ Q 

4 Q<-QU{g} 

5 fJ,(qi,...,qk) <-q 

6 o(g) 6ii),...,fe e^ fe )) 

7 for each w G E 

8 do (5(<?, iy) <— RECCONSTRUCTMDFA(5i(gi, w), . . . , S k (qk,w)) 

9 return q 

ConstructMDFA(£)Mi, . . . , £»M fc ) 

1 Q^(5^a^^^0 

2 ,s <— RecConstructMDFA(si, . . . , s k ) 

3 return (Q, 8, s, a) 

The function fi is used to ensure a new state in the MDFA corresponding 
to a position (q 1 , . . . , q k ) in the DFAs is created only once. We only create new 
states (by proceeding to Line 3) if [i(qi, . . . ,q k ) is undefined, which is the case 
if and only if (q%, ■ ■ ■ ,qk) has not been visited before. Otherwise we return the 
previously created state that is assigned to (J.(qi, ■ ■ ■ , qk) to the caller in Line 2. 
In Line 6 we by "qj G A," mean true if qj G Aj and false otherwise. 

For instance the requirements match(xi, "a6c") and match(x\, "abd* ") on 
x\ will result in the MDFA drawn in figure 6. 

Acceptance values 

1 : (false, false) 

2 : (false, false) 

3 : (false, true) 

4 : (true, false) 

5 : (false, true) 

Figure 6: The MDFA of the regular expressions: "abc" and "abd*" 

Note that the state (true, true) corresponding to match(x2 , ll abc v ) Amatch(x2, ll abd*" ) 
is not contained in the MDFA due to the fact that i("a6c") n L( ll abd*") = 0. 

Note also that this construction could be easily adapted to construct Mc if 
use the alphabet Ec and following the transition in the DFAs 

We want to make sure that the construction of the MDFA is minimal in 
the number of states it is contained. In order to prove this we need to define 
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what means to have a minimal number of states. This can be done by a natural 
generalizing the definition of a minimized DFA to a minimized MDFA 

Definition 5. A MDFA is minimized if all states in the MDFA are reachable 
from s and no pair of states in the MDFA are equivalent. For any pair of 
nodes p, q G Q : p and q are equivalent by definition if and only if for all words 
w G S* : a(S(p,w)) = a(S(q,w))). 

Lemma 1. If the DFAs given as input to ConstructMDFA are minimized 
then the constructed MDFA will be minimized. 

Proof. We first note that all states in Q are reachable. This is due to the fact 
that every state created except s will be a result of a recursive call made at line 
7. Hence every created state in the MDFA will be assigned to a S(q, w) for state 
q reachable by s and some w G S. 

We then prove that no pair of states in the constructed MDFA is equivalent 
if every DFAy, . . . , DFA^ is minimal. Consider any pair of distinct nodes p,q G 
Q. Suppose /x(pi, . . . ,Pk) = P and fi(qi, ■ ■ ■ , qu) = q- Since p ^ q we know by 
the initial check on line 1-2 that (p\, ... ,Pk) ^ {qi, ■ ■ ■ , Sfe)- Hence for some 
1 < i < k we have Pi,qi G Qi for which pi ^ qi- Since DFAi is minimized we 
know that pi is not equivalent to qi which implies that there exists an w G X* 
for which a(5i(pi,w)) ^ a(Si(qi,w)). This implies that a(d(fJ,(pi, . . . ,pk),w)) ^ 
a(8(n(q\, . . . , qk), w)) which by is the same as a(6(p, w)) ^ a(S(q, w)). Hence p 
and q are not equivalent. □ 



6 Reachable acceptance values 

As we noticed earlier then main problem we face by not joining all match- 
expression into one big DFA is that we lack the logic. We will present a notion 
we call Reachable acceptance values. The reachable acceptance values of a state 
p in an MDFA is the set containing exactly the acceptance values of every state 
q that can be reached from the state p by following zero or more transitions 
from p. Formally: 

R(jp) = {a{q) | p ~> q}, where p,qeQ (6) 

Example 4. The states in the MDFA on Figure 6 has that following reachable 

acceptance values: 
R(l) = -R(2) = i?(3) = {(true, false), (false, true), (false, false)}, 
i?(4) = {(true, false), (false, false)} and 

R(5) = {(false, true), (false, false)} 

The goal in this section is to compute and store the set of reachable accep- 
tance values for each of the states in an MDFA. When this set is stored we 
can at any state of the MDFA know in advance which acceptance values that 
we might end up in. Hence we can use this to constrain the logical structure, 
by only allowing values that can be reached from the current state. The exact 
meaning of "constraining the logical structure" will be clear in Section 7. 

Having defined the set of reachable acceptance values we now consider how 
to compute the set for every state in an MDFA in an efficient way. 
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6.1 Computing the reachable acceptance values 

We start by pointing out two obvious facts about the reachable acceptance 
values R for the nodes in an MDFA 

Fact 1: If a state p has transitions to exactly the states {qi, . . . ,qi} then R(p) = 
o(p) U R( qi ) U . . . U R(qi) 

Fact 2: If two states p, q belongs to the same strongly connected component 
we have R{p) = R{q). 

A strongly connected component in an MDFA (Q,S,s,a) is defined as a set 
of states C C Q for which it for any p £ C holds that p ~> q and q ~> p if 
and only if q £ C. Calculating the strongly connected components in an MDFA 
easily be done in linear time in the size of the MDFA [CLRS01]. 

ComputeReachableAcceptanceStates(M) 

1 Let C be the set of strongly connected components in Q 

2 for each C U C 2 £ C 

3 do if S(p, w) = q for some p E C±, q E C 2 ,w E £ 

4 thenr(Ci) ^r(Ci)UC 2 

5 for each C E C" 

6 do i?'(C*) <- U 9 ec{ a (9)} > Ensure Fact 1 

7 for each Ci E C in reverse topological order 

8 do R'{d) = R'{Ci) U Uc 2 er(C) ^'(^2) > Ensure Fact 2 

9 for each C £ C" 

10 do for each q G C 

11 do JZ(g) <- i?'(C) 

12 return R 

We assume that M = (Q, 5, s, a) is an MDFA and that initially R = R' = 
C = T = 0. In Line 2-4 we construct the neighbor function T(C) mapping any 
strongly connected component into the set of "children" of the strongly con- 
nected component. In Line 5-6 every R'(C) is assigned to the set of acceptance 
values of the states contained in C. In Line 7-8 for every connected component 
Ci, the set R{C\) is assigned to the union of all R'(C2)s for which C\ ~> C2 
in C. Note that the topological order in C" is well defined since C is a DAG 
[CLRS01]. Finally in Line 9-11 the reachable acceptance states of the strongly 
connected components are assigned to the reachable acceptance states of the 
states in Q 

7 The boolean logic of T 

We now return to the problem of representing the boolean logic of T. In Section 
4 the boolean logic was contained in the DFA, in the way that every C in the DFA 
Mq constructed in Section 4 was encoded by whether a state was an accepting 
state or not. 

Since we have divided the match-expressions in T into MDFAs on each of the 
variables x £ X no MDFA is can in it self decide whether T is satisfied or not. 
This is why the MDFAs are neither accepting or rejecting. However if we pick 
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a state from each of the MDFAs this set of states is a complete assignment to 
the variables in X . Such a set is an accepting state if and only if evaluating the 
match-expression by the rejection/acceptance of the match-DFAs used during 
the construction of the MDFAs, on the states corresponding to the states in 
the MDFAs, makes T true - exactly as in Section 4. We denote such a set an 
accepting set. Furthermore every state in an MDFA is valid if it occurs in some 
accepting set. If it occurs in no accepting set it is invalid. We observe that 
every accepting set of states correspond to an accepting state in Ma- 
in order to represent the boolean logic in T we define a CSP T> — (y,M,Q) 
based on C. The construction of the problem has many similarities with the cal- 
culation of the set of accepting states in Mc in Section 4. The variables in y are 
the same as the y- variables in Section 4 and all the constraints {f[match(xi j ,otj) 
yj] | / 6 J-} are constraints in Q . However we need some extra constraints in Q 
and another way to index the y- variables in y in this section, but basically this 
section is just an extension of the techniques used in Section 4. We will now 
present the notation that will be used in this section, that will help us describe 
the implementation of the three operations Build, Append and ValidDomain 
in the next section. 

Let V = (y,M,Q) be a CSP, where y = {yi, . . . ,y m } is a set of boolean 
variables and Q is a set of boolean constraints on the values that can be assigned 
y. Let (j> = {(yi,6i), . .., (y m ,b m )}, where y x ,...,y m G y and bi,...,b m G B 
denote a complete assignment of the variables in y to boolean values, or in 
short: an assignment to y. We define the solution to V by: 

sol(V) = {</> | h Q] 

where <j) is an assignment to y. Further we let the formulas {f[match(xi i , ay) <— 
yj] | / G J 7 } be a part of Q. 

For the use of this section we will define as the y- variable in y replacing 
the jth of the match-expressions on the variable Xi , for 1 < i < n and 1 < j < fej 
where ki is the number of match-expressions on xi that occurs in T . Using this 
notation we can restate y as 

y = {y{,...,yl,yl,...,yl ,y?,---,y%J ( 7 ) 

Using the shortcuts y l = (y\ , . . . , yj.. ) and b % = (b\ , . . . , b\. ) for every 1 < 
i < n where b\ , . . . , b\. G B we define: 

<i>{y i ) = {y\,...,yi i ) 

and 

l<j<ki 

We further define the shortcut: 

y* G 44 V y* = V 

where B l G B fci . We further denote the jth element in the acceptance value of 
a state qi in the MDFA on X{ by and the entire acceptance value of qi as 
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a l {qi) = {a\(qi), a*., (ft)), and define R){pi) as {a}(ft) | Pi ~» ft} and i? l (ft) 
as {a 4 (ft) | p.; ft}. 

Every assignment p to X corresponds to the assignment <f> to y where the 
truth- value of y l j corresponds to the truth value of the jth match-expression of 
Xi if where evaluating p{xi). More formally we say that 

p induces <f> p = {{y\ a x (5i(p(a;i)))), . . . , (y™, a n {S n {p{x n ))))} 

We want to ensure that 

p G sol(C) cj) p G sol(V) (8) 

where p is the assignment that induces </> p . 

The rightward implication of (8) can be satisfied by ensuring 

{f[match{x^a))^y)] /eJ} 

by including it in Q, which is quite similar to what we did in Section 4. 

The leftward implication in (8) was ensured in Section 4 by the defini- 
tion of Ac and the fact that only the accepting states that were reachable 
from the source of Mc were the states q G Ac where q = (qi, ■ ■ . ,q m ) — 

(fiiiwn), . . . , S m (wi m )) for some Wi, . . . ,w n G S* where 5 denotes transitions 
in the match-DFAs. In this section we need to ensure the leftward implication 
by adding the constraint: 

y l G R l (s t ) for all 1 < i < n (9) 
to Q. From this we get that if 

g = {f[match( Xi , a)) «- y*.] | / € J 7 } U f\ y l G i2*(s 4 ) 

l<i<Tl 

then (8) holds. 

We define the valid domains of by 

= {V G B fc * | 30 G soZCD) : cp{y l ) = b 1 } 

Note that this definition is different from the definition of V£., but is quite 
similar to the standard definition of valid domains as e.g. in [TH06]. This 
version however, is specialized for valid domains on the empty assignment and 
is a projection of the valid solution onto a vector of variables from y. 

Recall the shortcut pp' defined by pp' = {(xi, p(x\)p' (x\)), . . . , (x ni p{x n )p' (x n ) 
used in the definition of V£. in Definition 1 . Using this shortcut and that 

p G sol{C) <f>p G sol(V) where P = {(y 1 ,a 1 {S 1 (p{x 1 )))), . . . ,(y r \a n {S n {p(x n 

we get: 



pp' G sol(C) A p'(xi) = w} 
pp' G sol(C) A pp'{x t ) = p{xi)w} 
4> PP ' G sol(D) A pp'{xi) = p(xi)w} 
4> ppl G sol{V) A (j)pp>(y l ) = a l (S l (p{x l )w))} 
3<j) G sol(V) A <j)(y l ) = a l (5i(p(xi)w))} 
3<j) G sol(V) A <j)(y l ) = ¥ A W = ^(^(p^H)} 



3/ 



Kf 4 ={^es> 

= {tu G S* 
= {tu G £* 
= {w G S* 
= {w G S* 
= {w> G 
= {w G £* 

By this we know that when p G sol(C) P G sol{T>) is ensured we can 



a^MxAw)) <=V£} 
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compute VP. using only the MDFA M, and V^. To obtain a DFA (Q, S, 5, s, A) 
deciding V^ 1 . based on the MDFA Mj = (Qi, S l , Sj, a') on the variable iEj we can 
do the following: 

• set Q = Qi and 5 = 5- L 

• set A = { qi e Qi | eVj} 

• set s = ^(p(ari)) 

Note that this is very close to what was done in Section 4. The main differ- 
ence is that instead of making states accepting/rejecting at the preprocessing 
we construct A during the valid domain computation by using V£. = {w | 

a l (5i(p(xi))w) € Vyi)}. Further we have no need to change the alphabet which 
is needed in Section 4. 

Example 5. Consider the CSP: C = (X,Yj,T), where X = {xi,X2},J r = 
{/i)/2})/i = rnatchi{x2,"abc") V match 2 (xi, "a"),/2 = match 3 (x2, "abd* ") 
and X\ = X2 = e (Assume that match- expressions are ordered in increasing 
order of their subscript). We define the CSP T> = (y,M,Q). In T> we have 
y = {2/1,1/1,2/2}; an d disregarding the requirement (9) we have Q = {31,2/2} 
where g\ = y\ V y\ and 2/2 =2/1- We have the following facts: 

sol(V) = { {(y\, false), (yl, true), (yl, true)}, 

{(y\,true), (yf, false), (y\, true)}, 

{(y\,true), (yf,true), (y 2, true)}} 
fl(si) = Utrue), {false)} 
R(s2) = {(false, true), (true, false)} 

We now impose the requirement (9), that is 

(y 1 eR 1 (s 1 ))u(y 2 eR 2 (s 2 )) 
by adding it to Q . This requirement has earlier been defined as: 

G^Gul \/ y\ = 61 U I \/ y\ = 61 A 2/ 2 2 = M 
\&e-R(si) / \beR(s 2 ) J 

which corresponds to the requirement: 

4>(y\) G {(true), (false)} and ^(2/1,2/2) G {(false, true), (true, false)} 
respectively for any (f> G sol(T>). The latter constraint removes the assignments: 

{(2/1, false), (2/1, true), (y 2 , true)} and {(y\, true), (y\,true), (y 2 ,true)} 
from sol(T>). All constraints implied by the MDFAs are now contained in T> 
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We now have VI = {w G S* | a*(*iH) G V\}. From this we get 

= {w6S*|a(ii(ti)))e{(trae)}} 

= £(V') 

and 

= {w G X* | a(^2(f)) G {(false, true)} 

= L{ "abd*") 

8 The Algorithms 

In this section we will present the three algorithms Build(C), Append (xj, u>) 
and VALiDDOMAiN(xi). The first algorithm Build constructs a data structure 
that is used by Append and ValidDomain. In all algorithms we assume that 
V, Mi, . . . , M n , R 1 , . . . , R n , a 1 , . . . , a n , X and p are global variables. We assume 
that Vyi is available. We further assume that initially p *— {(xi, e), . . . , (x„, e)}, Q2 
and fci, . . . , k n = 

Build(C) 



1 Qi^T 

2 for i <— 1 to n 

3 do for each jth match expression on the variable Xj occuring in Q\ as match(xi, a* ) 

4 do replace match(xi, aj) in C/i by a new variable j/*- 

5 k<i ^ — \~ 1 

6 Build a DFA. M[ ■ on L(atf) 

7 ^ = (yi,...,4) 

8 Mj «- ConstructMDFA(M( 1 , . . . , M' i k . ) 

9 R % <— COMPUTEREACHABLEAcCEPTANCESTATES(Mi) 

10 ^ = {yi ! ...,^ 1 ,y?,...< ,2/1, C> 

11 for i <— 1 to n 

12 do&^&U^ ;Gi?( Si )). 

13 P=(y,giUe 2 ) 

14 if V# = 

15 then error "No feasible solutions" 

16 for i <— 1 to n 

17 do for each G Qi 

18 do if {^(qi)} n = 

19 then a 1 ^;) = 

20 <- i2*(g<) n t$ 

21 MlNIMIMIZE A/,; 



Line 1-10 constructs the first half of Q based on T . Line 11-12 constructs 
the second half of Q and Line 13 defines V. 14-15 check for feasible solution to 
C the reason for using V^ instead of sol(T>) is that we have not required that 
sol(D) is available to us. Line 16-21 tries to reduce the size of the data structure 
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by removing the acceptance values from a and R that cannot lead to a valid 
solution. Note that Line 18-19 might set a(q) = 0, which is not valid according 
to the definition of an MDFA. However we use the value in the pseudocode to 
indicate that this acceptance value never can be part of a solution to T>. 

VALIDDOMAIN(xi) 

1 A <- 

2 for each qi G Qi 

3 do if a l (q t ) G V® 

4 then A <— A U {&} 

5 a <— the regular expression corresponding to the DFA (Qi, E, 5i, Sj, Aj) 

6 return a 

This algorithm construct a DFA on the MDFA Ms accepting V£ = {w G 
E* | a((5j(w)) G ^f*} an d returns the regular expression corresponding to the 
constructed DFA. Of course we might consider other ways to indicate the valid 
domains than by returning a regular expression. This will be discussed in Section 
10 

Append (xj, w) 

1 g , <-gu(y i e#(6 i (8 i ,w))) 

2 ifg'\=± 

3 then error "invalid append" 

4 p(xi) <- p(xi)w 

5 Si^(5(s,;,w) 

6 0^^' 

We append the letter w to p(xj), and add a constraint to 5 in order to 
remove the assignments on y that are no longer possible to attain by any p. 

9 Implementation 

In the algorithms we have supposed that we have a data structure on T> that 
supports two operations: 

1. Adding constraints to Q. 

2. Computing for every 1 < i < m. 

This could be done by filtering on Q using one of the many filtering approaches 
(see e.g. [Dec03]). However in the setting of interactive configuration, were 
values are assigned one by one and valid domains and very fast valid domains 
computations has to be available, encoding the constraints by a BDD seems to 
be the obvious choice. We also choose to represent R(qi) as a BDD encoding 
of he constraint j/j G R l (qi). Hence setting Q «— g U (vi G R{qi)) can be 
done by setting BDD(0) <- BDD(^) A BDD(^ G R(q l )) 1 where BDD(W) is the 
BDD-representation of the conjunction of the set of boolean formulas in Tt. 

The algorithms used to minimize MDFAs in Build is a direct generalization 
of the one presented in [AHU74]. It runs in \Q\ log \ Q\ when Q are the states in 
the non-minimal MDFA. 
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The algorithm that transforms a DFA into a regular expression can be found 
in [HMU01]. It runs in 0(|<5| • \a\) where |<5| is the number of transitions in the 
DFA and \a\ is the number of characters in the resulting regular expression 

10 Extensions 

10.1 Encompassing previous BDDs in the current context 

Since T> is encoded as a BDD we can easily provide support for boolean and 
integer variables allowing the same operations as usual in on-line configuration. 
For instance we would be able to accept constraints as xi ^ 7VxiAmatch(x2, "7* 
") A match(xz, "a6c * ") on the variables Xi,X2,x^. Currently we cannot model 
equality of two string but it could easily be added. 

One might also choose to encode the integer as a string in some cases. For 
instance a regular expression can be used to determine whether a integer of 
infinite length is a factor of 2 or a factor of 3. 

10.2 k-shortest path 

If we are to present the valid domain of a variable to the user, i.e. to help the 
user in completing a string, a regular expression might not be very intuitive - 
especially if the concept of regular expressions is unknown for the user. Hence 
one might consider other strategies. 

One idea would be only to output the shortest text-completion. This can 
be done in |Q|log|Q| + \delta\ using Dijkstras algorithm, where \Q\ and \S\ is 
the number of states and transitions in the MDFA respectively. We can also 
find the k shortest paths in 0(|<5| + |Q|^og|Q| + k) time [Epp94] and find the k 
shortest simple paths in 0(k\Q\(\S\ + \Q\log\Q\)) [Yen72]. 

If more than one acceptance value is valid one might consider to output the 
shortest path to each of the valid acceptance values one at a time. 

10.3 Completing a string 

We might want to support two kinds of updates: 

• Appending a letter tutoa string Xi S X as earlier described 

• Completing a string xi G X 

To complete a variable Xi is in some way to state that no more letters will be 
appended to p(xi). This could in the example of input field validation be stated 
by the user in hitting the return key or leaving a text field. We support this 
second update as the action of appending a special letter eol g E to p{xi), and 
disallowing appending letters to p(xi) if the last letter of p(xt) is eol. 

10.4 Making savings by a simple heuristic 

It might be considered to make a simple reduction. Rewritten expressions 
like match(x, a) V match(x, (3) to match(x, a U f3) and similarly match(x, a) A 
match(x, (3) to match(x, a D (3). These rewritings may leads a large reduction 
in space as the DFA will not need to worry about 2 cases instead of 4. 
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10.5 Supporting initial domain of X 

In this paper we have assumed that the initial domain of any x 6 X is E*. In 
practice we might want to constrain the initial domain by a regular expression. 
For instance we might chose to constrain the zip code to only contain digits 
from the very start by adding match(z\p, "(0|1|2|3|4|5|6|7|8|9) * ") to Q as an 
initial constraint. 

11 Future Work 

An obvious extension would be to explore whether it is possible to achieve the 
same functionality with languages that are more expressive than the regular 
languages. For instance we might investigate if we can handle context-free 
languages [HMU01]. 

Another thought that might be pursued is whether the input language used 
to declare the constraints of T is appropriate for declaring the constraints of T . 
Formally it is perfect as every regular language can be expressed as a regular 
expression. However the length and complexity of these expressions may make 
it cumbersome to express even simple constraints. Consider for instance the 
constraint that x is in the regular language of natural numbers divisible by 
3. This regular language can be modeled by a DFA with 3 states and nine 
transitions. In our current inpu-language this will have to be expressed as 
/ = match(x, "([0369] * | ([147] | ([258] [0369] * [258]))[0369] * ( [258] |( [147] [0369] * 
[147] )) | ( [258] | ( [147] [0369] * [147]))[0369] * ([147] | ([258] [0369] * [258])))*"). This 
suggest that we might consider some other ways to model the DFA constraints 
than the maic/i-expression. The ad hoc solution to the problem stated above 
could be to allow expressions in the input-language on the form ll x modulo 
y = z" where x, y, z £ Z. But we can easily construct similar problems that will 
cause other problems. Hence a challenge is to consider how the input language 
can be made in a way so that it is easy to express problem the numerous 
problems that have nice DFAs but are horrible to express as regular-expressions. 

Another problem is how to make the user who in most cases will have little or 
no acquaintance with regular expression make constraints that can be enforced 
by the data structure. 
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Figure 7: A valid domains DFA built on the formulas /i = match(x2,' 'abc") V 
match(xi, u &"), f% = maic/i (.T2,"abd*"). Transitions 1:* an 2:* means transi- 
tions on all other letter that cannot follow any transition on the first or second 
variable respectively. Dashed states are states where no accepting state is reach- 
able. If the DFA is minimized they will all be contracted to the same state 
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